Take_home_Ex1

Modified

December 3, 2023

Overview

The recent shift in payment being made more digital, companies and organisations can more easily collect data and information that are linked to consumer habits. The transportation industry including public transport such as buses has also lean into this phenomenon. The information collected include travelling patterns that can help companies plan for more efficient routes or where heavy ridership is to be expected.

Objectives

Exploratory Spatial Data Analysis (ESDA) hold tremendous potential to address complex problems facing society. In this study, you are tasked to apply appropriate Local Indicators of Spatial Association (GLISA) and Emerging Hot Spot Analysis (EHSA) to undercover the spatial and spatio-temporal mobility patterns of public bus passengers in Singapore.

Task

Here we will utilise bus travelling data at different time duration for plotting out geospatial data and analysing them using various statistical tools.

Geovisualisation and Analysis

Computing passenger trips at the hexagonal level for the following time intervals:

  • Weekday morning peak, 6am to 9am
  • Weekday afternoon peak, 5pm to 8pm
  • Weekend/holiday morning peak, 11am to 2pm
  • Weekend/holiday evening peak, 4pm to 7pm

Display the geographical distribution using choropleth maps of the hexagons.

Combine all of the passenger trips made by all of the bus stops within a hexagon together

Local Indicators of Spatial Association(LISA) Analysis

Utilise Queen’s contiguity for performing LISA of the passenger trips by origin at hexagonal level Displat the LISA maps of the passenger trips at hexagonal level.

Load Packages and Data

Load packages

Here we will load the packages needed for this exercise and their respective functions - sf: - tmap: - spdep: - tidyverse: - dplyr: - mapview: - sfdep:

pacman::p_load(sf,tmap,spdep,tidyverse, dplyr, mapview, sfdep, stplanr, glue)

Loading data

Loading aspatial table

Here we will read all of the ridership from different bus stops in Oct 2023 and assign it to the variable.

odbus <- read_csv("data/aspatial/origin_destination_bus_202310.csv")

We will then extract the information from the following and assign them to different variables.

Day Duration Variable name
Weekdays 6am - 9am weekdayAM_6_9
Weekdays 5pm - 8pm weekdayPM_5_8
Weekends/Holidays 11am - 2pm weekendAM_11_14
Weekends/Holidays 4pm - 7pm weekendPM_4_7
weekdayAM_6_9 <- odbus %>% 
  filter(DAY_TYPE == "WEEKDAY") %>%
  filter(TIME_PER_HOUR >= 6 & TIME_PER_HOUR <= 9) %>%
  group_by(ORIGIN_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))

weekdayPM_5_8 <- odbus %>% 
  filter(DAY_TYPE == "WEEKDAY") %>%
  filter(TIME_PER_HOUR >= 17 & TIME_PER_HOUR <= 20) %>%
  group_by(ORIGIN_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))

weekendAM_11_14 <- odbus %>% 
  filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
  filter(TIME_PER_HOUR >= 11 & TIME_PER_HOUR <= 14) %>%
  group_by(ORIGIN_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))

weekendPM_16_19 <- odbus %>% 
  filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
  filter(TIME_PER_HOUR >= 17 & TIME_PER_HOUR <= 19) %>%
  group_by(ORIGIN_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))

Loading Geospatial data

Next we will import all of the bus stops and their coordinates and attached it to the busstop variable.

busstop <- st_read(dsn = "data/geospatial", layer = "BusStop") %>%
  st_transform(crs = 3414)
Reading layer `BusStop' from data source 
  `C:\Users\Lian Khye\Desktop\MITB\Geospatial\geartooth\ISSS624\Take_home_Ex1\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21

We will first rename the bus stop column title for easier data joining.

colnames(busstop)[colnames(busstop) == "BUS_STOP_N"] <- "ORIGIN_PT_CODE"

We will also import the layout of Singapore for excluding busstops that are not found in Singapore.

mpsz <- st_read(dsn = "data/geospatial",
                   layer = "MPSZ-2019") %>%
  st_transform(crs = 3414)
Reading layer `MPSZ-2019' from data source 
  `C:\Users\Lian Khye\Desktop\MITB\Geospatial\geartooth\ISSS624\Take_home_Ex1\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84

After that we will create the hexagons that will create the map layout. The hexagons will be shaped 250 x 250 cell size. All of the hexagons will also be given a grid id name that can be used for identifying each individual grid.

center <- st_centroid(busstop)

area_honeycomb_grid <- st_make_grid(center, cellsize = c(250 * sqrt(3), 250 * 2), what = "polygons", square = FALSE)
honeycomb_grid_sf <- st_sf(area_honeycomb_grid) %>%
  mutate(grid_id = 1:length(lengths(area_honeycomb_grid)))

Data processing

Assigning individual bus stop to hexagons

First we will assign the bus stop point geometry data to each polygon using st_intersection(). The function assigns all of the points to a polygon by the point-set intersection of two geometries. Additional information here.

valid_busstop <- st_intersection(busstop, mpsz)
busstop_hex <- st_intersection(valid_busstop, honeycomb_grid_sf) %>%
  st_drop_geometry()

Duplication check

Here we will check for the presence of any duplication before we further process the data.

duplicate1 <- weekdayAM_6_9 %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()

duplicate2 <- weekdayPM_5_8 %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()

duplicate3 <- weekendAM_11_14 %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()

duplicate4 <- weekendPM_16_19 %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()

We can see which data points are duplicated here.

c(duplicate1,duplicate2,duplicate3,duplicate4)
$ORIGIN_PT_CODE
character(0)

$TRIPS
numeric(0)

$ORIGIN_PT_CODE
character(0)

$TRIPS
numeric(0)

$ORIGIN_PT_CODE
character(0)

$TRIPS
numeric(0)

$ORIGIN_PT_CODE
character(0)

$TRIPS
numeric(0)

Finally we only keep data points that are unique using the unique() function.

unique_weekdayAM <- unique(weekdayAM_6_9)
unique_weekdayPM <- unique(weekdayPM_5_8)
unique_weekendAM <- unique(weekendAM_11_14)
unique_weekendPM <- unique(weekendPM_16_19)

Trip tabulation

Next we will then join the variables that we created earlier that contains the total number of trips at different time intervals and the busstop_hex variable together using grid_id column title that they have in common. The total number of trips made from each hexagon is then summed up together and placed under a new column named TOT_TRIPS.

count_weekdayAM_6_9 <- left_join(unique_weekdayAM , busstop_hex) %>%
  group_by(grid_id) %>%
  summarise(TOT_TRIPS = sum(TRIPS))

count_weekdayPM_5_8 <- left_join(unique_weekdayPM , busstop_hex) %>%
  group_by(grid_id) %>%
  summarise(TOT_TRIPS = sum(TRIPS))

count_weekendAM_11_14 <- left_join(unique_weekendAM , busstop_hex) %>%
  group_by(grid_id) %>%
  summarise(TOT_TRIPS = sum(TRIPS))

count_weekendPM_16_19 <- left_join(unique_weekendPM , busstop_hex) %>%
  group_by(grid_id) %>%
  summarise(TOT_TRIPS = sum(TRIPS))

Reassign polygon information

We will the reassign the polygon information from the hexagonal map that we have created earlier.

poly_weekdayAM_6_9 <- left_join(honeycomb_grid_sf,count_weekdayAM_6_9)
poly_weekdayPM_5_8 <- left_join(honeycomb_grid_sf,count_weekdayPM_5_8)
poly_weekendAM_11_14 <- left_join(honeycomb_grid_sf,count_weekendAM_11_14)
poly_weekendPM_16_19 <- left_join(honeycomb_grid_sf,count_weekendPM_16_19)

Filter for empty trips

Following that we will filter hexagons that have no trips to obtain only valid hexagons for mapping.

grid_weekdayAM <- poly_weekdayAM_6_9 %>%
  filter(TOT_TRIPS > 0)

grid_weekdayPM <- poly_weekdayPM_5_8 %>%
  filter(TOT_TRIPS > 0)

grid_weekendAM <- poly_weekendAM_11_14 %>%
  filter(TOT_TRIPS > 0)

grid_weekendPM <- poly_weekendPM_16_19 %>%
  filter(TOT_TRIPS > 0)

Choropleth map

Here we will plot the choropleth map for the different time intervals. We will be using tmap_mode(“plot”) to create an interactive map. Although we will be coding in accessories such as the compass, they will not be displayed in the interactive map. However by writing them first, we can display them in subsequent maps once we view them in plot mode.

tmap_mode("view")
mapA <- tm_shape(grid_weekdayAM)+
  tm_fill("TOT_TRIPS", 
          style = "quantile", 
          palette = "Blues",
          title = "Passenger trips") +
  tm_layout(main.title = "Passenger trips generated Weekday 6am-9am",
            main.title.position = "center",
            main.title.size = 0.7,
            legend.height = 0.45, 
            legend.width = 0.35,
            frame = TRUE) +
  tm_borders(alpha = 0.5) +
  tm_compass(type="8star", size = 1) +
  tm_scale_bar() +
  tm_grid(alpha =0.2) +
  tm_credits("Source: Planning Sub-zone boundary from URA\n and Passenger trips data from LTA", 
             position = c("left", "bottom"))
mapA

The total number of ridership range from 1 to 357043 per hexagon. The range of the data are divided to quantile range bands for clearer distinction between ridership of each hexagon.

mapB <- tm_shape(grid_weekdayPM)+
  tm_fill("TOT_TRIPS", 
          style = "quantile", 
          palette = "Reds",
          title = "Passenger trips") +
  tm_layout(main.title = "Passenger trips generated Weekday 5pm-8pm",
            main.title.position = "center",
            main.title.size = 0.7,
            legend.height = 0.45, 
            legend.width = 0.35,
            frame = TRUE) +
  tm_borders(alpha = 0.5) +
  tm_compass(type="8star", size = 1) +
  tm_scale_bar() +
  tm_grid(alpha =0.2) +
  tm_credits("Source: Planning Sub-zone boundary from URA\n and Passenger trips data from LTA", 
             position = c("left", "bottom"))
mapB

The total number of ridership range from 1 to 568845 per hexagon.

mapC <- tm_shape(grid_weekendAM)+
  tm_fill("TOT_TRIPS", 
          style = "quantile", 
          palette = "Greens",
          title = "Passenger trips") +
  tm_layout(main.title = "Passenger trips generated Weekend/holidays 11am-2pm",
            main.title.position = "center",
            main.title.size = 0.7,
            legend.height = 0.45, 
            legend.width = 0.35,
            frame = TRUE) +
  tm_borders(alpha = 0.5) +
  tm_compass(type="8star", size = 1) +
  tm_scale_bar() +
  tm_grid(alpha =0.2) +
  tm_credits("Source: Planning Sub-zone boundary from URA\n and Passenger trips data from LTA", 
             position = c("left", "bottom"))
mapC

The total number of ridership range from 1 to 117609 per hexagon.

mapD <- tm_shape(grid_weekendPM)+
  tm_fill("TOT_TRIPS", 
          style = "quantile", 
          palette = "Purples",
          title = "Passenger trips") +
  tm_layout(main.title = "Passenger trips generated Weekend/holidays 4pm-7pm",
            main.title.position = "center",
            main.title.size = 0.7,
            legend.height = 0.45, 
            legend.width = 0.35,
            frame = TRUE) +
  tm_borders(alpha = 0.5) +
  tm_compass(type="8star", size = 1) +
  tm_scale_bar() +
  tm_grid(alpha =0.2) +
  tm_credits("Source: Planning Sub-zone boundary from URA\n and Passenger trips data from LTA", 
             position = c("left", "bottom"))
mapD

The total number of ridership range from 1 to 114410 per hexagon.

Plot maps

We can also utilise a plot mode by using tmap_mode(“plot”).

tmap_mode("plot")
mapA

This allows for other accessories to be added such as compass and scales which might be useful depending on the application or use of the map.

Choropleth map discussion

We can see that the total ridership over the weekends is lower than the weekday counterparts despite both being classified as peak hours. This difference is likely due to people travelling to or from work on the weekdays as compared to the weekends where there will be less traffic as people choose to stay at home or electing to travel to districts where it may be more accessible by other transportations such as trains.

LISA

Determining the K-nearest neighbour

As some of the data points are isolated or are far away from each other, it is better to use the adaptive distance weighting method to determine the number of neighbours. Here we will use 6 neighbours as a repesentative of 6 sides to a hexagon.

adap_weekdayAM <- grid_weekdayAM %>% 
  mutate(nb = st_knn(area_honeycomb_grid,
                     k=6),
         wt = st_weights(nb),
               .before = 1)

adap_weekdayPM <- grid_weekdayPM %>% 
  mutate(nb = st_knn(area_honeycomb_grid,
                     k=6),
         wt = st_weights(nb),
               .before = 1)

adap_weekendAM <- grid_weekendAM %>% 
  mutate(nb = st_knn(area_honeycomb_grid,
                     k=6),
         wt = st_weights(nb),
               .before = 1)

adap_weekendPM <- grid_weekendPM %>% 
  mutate(nb = st_knn(area_honeycomb_grid,
                     k=6),
         wt = st_weights(nb),
               .before = 1)

Visualise Adaptive distance map

Here we will visualise the choropleth map of the nearest neighbour for the different time intervals. We will plot it using view mode with the regular choropleth map of the left and the choropleth map with the adaptive distance weighted map on the right.

We can mouse over each of the grid in the adaptive distance weighted map to see who the neighbours are, however the neighbours seen is from the index number of the weighted neighbour variable we generated above and not the grid number.

tmap_mode("view")
adapA <- tm_shape(adap_weekdayAM)+
  tm_fill("TOT_TRIPS", 
          style = "quantile", 
          palette = "Blues",
          title = "Passenger trips")

tmap_arrange(mapA, adapA, asp=1, ncol=2)
adapB <- tm_shape(adap_weekdayPM)+
  tm_fill("TOT_TRIPS", 
          style = "quantile", 
          palette = "Reds",
          title = "Passenger trips")

tmap_arrange(mapB, adapB, asp=1, ncol=2)
adapC <- tm_shape(adap_weekendAM)+
  tm_fill("TOT_TRIPS", 
          style = "quantile", 
          palette = "Greens",
          title = "Passenger trips")

tmap_arrange(mapC, adapC, asp=1, ncol=2)
adapD <- tm_shape(adap_weekendPM)+
  tm_fill("TOT_TRIPS", 
          style = "quantile", 
          palette = "Purples",
          title = "Passenger trips")

tmap_arrange(mapD, adapD, asp=1, ncol=2)

Local Moran’s I

Next we will calculate the local Moran’s I of the total number of trips of each hexagon using the local_moran() function of the sfdep package.

lisa_weekdayAM <- adap_weekdayAM %>%
  mutate(local_moran = local_moran(
    TOT_TRIPS, nb, wt,nsim = 99),
    .before = 1) %>%
  unnest(local_moran)

lisa_weekdayPM <- adap_weekdayPM %>%
  mutate(local_moran = local_moran(
    TOT_TRIPS, nb, wt,nsim = 99),
    .before = 1) %>%
  unnest(local_moran)

lisa_weekendAM <- adap_weekendAM %>%
  mutate(local_moran = local_moran(
    TOT_TRIPS, nb, wt,nsim = 99),
    .before = 1) %>%
  unnest(local_moran)

lisa_weekendPM <- adap_weekendPM %>%
  mutate(local_moran = local_moran(
    TOT_TRIPS, nb, wt,nsim = 99),
    .before = 1) %>%
  unnest(local_moran)

The output will be a data fram containing the ii, eii, var_ii, z_ii, p_ii, p_ii_sim and p_folded_sum.

Visualisation of local Moran’s I and p-value

Next we will plot the choropleth maps using the **ii* and p_ii_sim field.

tmap_mode("plot")
map1_weekdayAM <- tm_shape(lisa_weekdayAM) +
  tm_fill("ii") + 
  tm_borders(alpha = 0.5) +
  tm_view(set.zoom.limits = c(6,8)) +
  tm_layout(main.title = "local Moran's I of TOT_TRIPS(Weekday 6am to 9am)",
            main.title.size = 0.8)

map2_weekdayAM <- tm_shape(lisa_weekdayAM) +
  tm_fill("p_ii_sim",
          breaks = c(0, 0.001, 0.01, 0.05, 1),
              labels = c("0.001", "0.01", "0.05", "Not sig")) + 
  tm_borders(alpha = 0.5) +
  tm_layout(main.title = "p-value of local Moran's I of TOT_TRIPS(Weekday 6am to 9am)",
            main.title.size = 0.8)

tmap_arrange(map1_weekdayAM, map2_weekdayAM, ncol = 2)

tmap_mode("plot")
map1_weekdayPM <- tm_shape(lisa_weekdayPM) +
  tm_fill("ii") + 
  tm_borders(alpha = 0.5) +
  tm_view(set.zoom.limits = c(6,8)) +
  tm_layout(main.title = "local Moran's I of TOT_TRIPS(Weekday 5pm to 8pm)",
            main.title.size = 0.8)

map2_weekdayPM <- tm_shape(lisa_weekdayPM) +
  tm_fill("p_ii_sim",
          breaks = c(0, 0.001, 0.01, 0.05, 1),
              labels = c("0.001", "0.01", "0.05", "Not sig")) + 
  tm_borders(alpha = 0.5) +
  tm_layout(main.title = "p-value of local Moran's I of TOT_TRIPS(Weekday 5pm to 8pm)",
            main.title.size = 0.8)

tmap_arrange(map1_weekdayPM, map2_weekdayPM, ncol = 2)

tmap_mode("plot")
map1_weekendAM <- tm_shape(lisa_weekendAM) +
  tm_fill("ii") + 
  tm_borders(alpha = 0.5) +
  tm_view(set.zoom.limits = c(6,8)) +
  tm_layout(main.title = "local Moran's I of TOT_TRIPS(Weekend/Holidays 11am to 2pm)",
            main.title.size = 0.8)

map2_weekendAM <- tm_shape(lisa_weekendAM) +
  tm_fill("p_ii_sim",
          breaks = c(0, 0.001, 0.01, 0.05, 1),
              labels = c("0.001", "0.01", "0.05", "Not sig")) + 
  tm_borders(alpha = 0.5) +
  tm_layout(main.title = "p-value of local Moran's I of TOT_TRIPS(Weekend/Holidays 11am to 2pm)",
            main.title.size = 0.8)

tmap_arrange(map1_weekendAM, map2_weekendAM, ncol = 2)

tmap_mode("plot")
map1_weekendPM <- tm_shape(lisa_weekendPM) +
  tm_fill("ii") + 
  tm_borders(alpha = 0.5) +
  tm_view(set.zoom.limits = c(6,8)) +
  tm_layout(main.title = "local Moran's I of TOT_TRIPS(Weekend/Holidays 4pm to 7pm)",
            main.title.size = 0.8)

map2_weekendPM <- tm_shape(lisa_weekendPM) +
  tm_fill("p_ii_sim",
          breaks = c(0, 0.001, 0.01, 0.05, 1),
              labels = c("0.001", "0.01", "0.05", "Not sig")) + 
  tm_borders(alpha = 0.5) +
  tm_layout(main.title = "p-value of local Moran's I of TOT_TRIPS(Weekend/Holidays 4pm to 7pm)",
            main.title.size = 0.8)

tmap_arrange(map1_weekendPM, map2_weekendPM, ncol = 2)

Visualising LISA map

Finally we will create the LISA map. It is created using the local Moran’s I and their p-values. We will be using the mean for plotting.

The data that will be created will have 4 categories made of 2 outliers and 2 clusters.

Outliers: - High-Low: High total trips with low neighbours

  • Low-High: Low total trips with high neighbours

Clusters: - High-High: High total trips with high neighbours - Low-Low: Low total trips with low neighbours

More information can be found here.

tmap_mode("view")
lisa_sig_weekdayAM <- lisa_weekdayAM  %>%
  filter(p_ii_sim < 0.05)
tmap_mode("plot")
tm_shape(lisa_weekdayAM) +
  tm_polygons() +
  tm_borders(alpha = 0.5) +
tm_shape(lisa_sig_weekdayAM) +
  tm_fill("mean") + 
  tm_borders(alpha = 0.4)

highest_grid_weekdayAM <- lisa_weekdayAM %>%
  group_by(mean) %>%
  slice_max(TOT_TRIPS) %>%
  select(10, 15,16)
HL_weekdayAM <- busstop_hex %>%
  left_join(highest_grid_weekdayAM, by = "grid_id") %>%
  filter(grepl("High-Low", mean, ignore.case = TRUE)) %>%
  select(3)%>%
  summarise(HL = paste(LOC_DESC, collapse = ", "))

LH_weekdayAM <- busstop_hex %>%
  left_join(highest_grid_weekdayAM, by = "grid_id") %>%
  filter(grepl("Low-High", mean, ignore.case = TRUE)) %>%
  select(3)%>%
  summarise(LH = paste(LOC_DESC, collapse = ", "))

HH_weekdayAM <- busstop_hex %>%
  left_join(highest_grid_weekdayAM, by = "grid_id") %>%
  filter(grepl("High-High", mean, ignore.case = TRUE)) %>%
  select(3)%>%
  summarise(Hh = paste(LOC_DESC, collapse = ", "))

LL_weekdayAM <- busstop_hex %>%
  left_join(highest_grid_weekdayAM, by = "grid_id") %>%
  filter(grepl("Low-Low", mean, ignore.case = TRUE)) %>%
  select(3)%>%
  summarise(LL = paste(LOC_DESC, collapse = ", "))

outliers_weekdayAM <- glue("The bus stops with high-low outliers are:\n {HL_weekdayAM}.\n\nThe bus stops with low-high outliers are:\n{LH_weekdayAM}")

clusters_weekdayAM <- glue("The bus stops with high-high clusters are:\n {HH_weekdayAM}.\n\nThe bus stops with low-low clusters are:\n{LL_weekdayAM}")

summ <- glue("{outliers_weekdayAM} \n\n {clusters_weekdayAM}")

summ
The bus stops with high-low outliers are:
 OPP W'LANDS CIVIC CTR, WOODLANDS REG INT, WOODLANDS REG INT, W'LANDS CIVIC CTR.

The bus stops with low-high outliers are:
OPP ROCHOR STN, ROCHOR STN, SELEGIE CTR 

The bus stops with high-high clusters are:
 BOON LAY INT.

The bus stops with low-low clusters are:
BLK 670A, BLK 672A
tmap_mode("view")
lisa_sig_weekdayPM <- lisa_weekdayPM  %>%
  filter(p_ii_sim < 0.05)
tmap_mode("plot")
tm_shape(lisa_weekdayPM) +
  tm_polygons() +
  tm_borders(alpha = 0.5) +
tm_shape(lisa_sig_weekdayPM) +
  tm_fill("mean") + 
  tm_borders(alpha = 0.4)

highest_grid_weekdayPM <- lisa_weekdayPM %>%
  group_by(mean) %>%
  slice_max(TOT_TRIPS) %>%
  select(10, 15,16)



HL_weekdayPM <- busstop_hex %>%
  left_join(highest_grid_weekdayPM, by = "grid_id") %>%
  filter(grepl("High-Low", mean, ignore.case = TRUE)) %>%
  select(3)%>%
  summarise(HL = paste(LOC_DESC, collapse = ", "))

LH_weekdayPM <- busstop_hex %>%
  left_join(highest_grid_weekdayPM, by = "grid_id") %>%
  filter(grepl("Low-High", mean, ignore.case = TRUE)) %>%
  select(3)%>%
  summarise(LH = paste(LOC_DESC, collapse = ", "))

HH_weekdayPM <- busstop_hex %>%
  left_join(highest_grid_weekdayPM, by = "grid_id") %>%
  filter(grepl("High-High", mean, ignore.case = TRUE)) %>%
  select(3)%>%
  summarise(Hh = paste(LOC_DESC, collapse = ", "))

LL_weekdayPM <- busstop_hex %>%
  left_join(highest_grid_weekdayPM, by = "grid_id") %>%
  filter(grepl("Low-Low", mean, ignore.case = TRUE)) %>%
  select(3)%>%
  summarise(LL = paste(LOC_DESC, collapse = ", "))

outliers_weekdayPM <- glue("The bus stops with high-low outliers are:\n {HL_weekdayPM}.\n\nThe bus stops with low-high outliers are:\n{LH_weekdayPM}")

clusters_weekdayPM <- glue("The bus stops with high-high clusters are:\n {HH_weekdayPM}.\n\nThe bus stops with low-low clusters are:\n{LL_weekdayPM}")

summ <- glue("{outliers_weekdayPM} \n\n {clusters_weekdayPM}")

summ
The bus stops with high-low outliers are:
 OPP W'LANDS CIVIC CTR, WOODLANDS REG INT, WOODLANDS REG INT, W'LANDS CIVIC CTR.

The bus stops with low-high outliers are:
BLK 515, OPP BLK 515, BLK 638, BLK 612, RIS GRANDEUR, BLK 613, OPP RIS GRANDEUR 

The bus stops with high-high clusters are:
 BOON LAY INT.

The bus stops with low-low clusters are:
OPP BLK 482, BLK 405, BLK 479, BLK 467, OPP BLK 479
tmap_mode("view")
lisa_sig_weekendAM <- lisa_weekendAM  %>%
  filter(p_ii_sim < 0.05)
tmap_mode("plot")
tm_shape(lisa_weekendAM) +
  tm_polygons() +
  tm_borders(alpha = 0.5) +
tm_shape(lisa_sig_weekendAM) +
  tm_fill("mean") + 
  tm_borders(alpha = 0.4)

highest_grid_weekendAM <- lisa_weekendAM %>%
  group_by(mean) %>%
  slice_max(TOT_TRIPS) %>%
  select(10, 15,16)



HL_weekendAM <- busstop_hex %>%
  left_join(highest_grid_weekendAM, by = "grid_id") %>%
  filter(grepl("High-Low", mean, ignore.case = TRUE)) %>%
  select(3)%>%
  summarise(HL = paste(LOC_DESC, collapse = ", "))

LH_weekendAM <- busstop_hex %>%
  left_join(highest_grid_weekendAM, by = "grid_id") %>%
  filter(grepl("Low-High", mean, ignore.case = TRUE)) %>%
  select(3)%>%
  summarise(LH = paste(LOC_DESC, collapse = ", "))

HH_weekendAM <- busstop_hex %>%
  left_join(highest_grid_weekendAM, by = "grid_id") %>%
  filter(grepl("High-High", mean, ignore.case = TRUE)) %>%
  select(3)%>%
  summarise(Hh = paste(LOC_DESC, collapse = ", "))

LL_weekendAM <- busstop_hex %>%
  left_join(highest_grid_weekendAM, by = "grid_id") %>%
  filter(grepl("Low-Low", mean, ignore.case = TRUE)) %>%
  select(3)%>%
  summarise(LL = paste(LOC_DESC, collapse = ", "))

outliers_weekendAM <- glue("The bus stops with high-low outliers are:\n {HL_weekendAM}.\n\nThe bus stops with low-high outliers are:\n{LH_weekendAM}")

clusters_weekendAM <- glue("The bus stops with high-high clusters are:\n {HH_weekendAM}.\n\nThe bus stops with low-low clusters are:\n{LL_weekendAM}")

summ <- glue("{outliers_weekendAM} \n\n {clusters_weekendAM}")

summ
The bus stops with high-low outliers are:
 OPP W'LANDS CIVIC CTR, WOODLANDS REG INT, WOODLANDS REG INT, W'LANDS CIVIC CTR.

The bus stops with low-high outliers are:
JUNCTION 10, OPP JUNCTION 10 

The bus stops with high-high clusters are:
 BOON LAY INT.

The bus stops with low-low clusters are:
OPP BLK 590C, BLK 589D, BLK 590C, OPP BLK 589D
tmap_mode("view")
lisa_sig_weekendPM <- lisa_weekendPM  %>%
  filter(p_ii_sim < 0.05)
tmap_mode("plot")
tm_shape(lisa_weekendPM) +
  tm_polygons() +
  tm_borders(alpha = 0.5) +
tm_shape(lisa_sig_weekendPM) +
  tm_fill("mean") + 
  tm_borders(alpha = 0.4)

highest_grid_weekendPM <- lisa_weekendPM %>%
  group_by(mean) %>%
  slice_max(TOT_TRIPS) %>%
  select(10, 15,16)



HL_weekendPM <- busstop_hex %>%
  left_join(highest_grid_weekendPM, by = "grid_id") %>%
  filter(grepl("High-Low", mean, ignore.case = TRUE)) %>%
  select(3)%>%
  summarise(HL = paste(LOC_DESC, collapse = ", "))

LH_weekendPM <- busstop_hex %>%
  left_join(highest_grid_weekendPM, by = "grid_id") %>%
  filter(grepl("Low-High", mean, ignore.case = TRUE)) %>%
  select(3)%>%
  summarise(LH = paste(LOC_DESC, collapse = ", "))

HH_weekendPM <- busstop_hex %>%
  left_join(highest_grid_weekendPM, by = "grid_id") %>%
  filter(grepl("High-High", mean, ignore.case = TRUE)) %>%
  select(3)%>%
  summarise(Hh = paste(LOC_DESC, collapse = ", "))

LL_weekendPM <- busstop_hex %>%
  left_join(highest_grid_weekendPM, by = "grid_id") %>%
  filter(grepl("Low-Low", mean, ignore.case = TRUE)) %>%
  select(3)%>%
  summarise(LL = paste(LOC_DESC, collapse = ", "))

outliers_weekendPM <- glue("The bus stops with high-low outliers are:\n {HL_weekendPM}.\n\nThe bus stops with low-high outliers are:\n{LH_weekendPM}")

clusters_weekendPM <- glue("The bus stops with high-high clusters are:\n {HH_weekendPM}.\n\nThe bus stops with low-low clusters are:\n{LL_weekendPM}")

summ <- glue("{outliers_weekendPM} \n\n {clusters_weekendPM}")

summ
The bus stops with high-low outliers are:
 OPP W'LANDS CIVIC CTR, WOODLANDS REG INT, WOODLANDS REG INT, W'LANDS CIVIC CTR.

The bus stops with low-high outliers are:
TG KATONG GIRLS' SCH, OPP TG KATONG GIRLS' SCH 

The bus stops with high-high clusters are:
 BOON LAY INT.

The bus stops with low-low clusters are:
CYCLE & CARRIAGE, OPP EUNOS TECHNOLINK, AFT REGENT MOTORS, EUNOS TECHNOLINK

Additional processing

Passenger flow visualisation

Origin and destination grouping

We can also take a look at passenger glow between the different hexes by plotting the desire lines. We will first need to obtain the total number of trips of each intervals that are grouped by their origin followed by their destination.

des_weekdayAM_6_9 <- odbus %>% 
  filter(DAY_TYPE == "WEEKDAY") %>%
  filter(TIME_PER_HOUR >= 6 & TIME_PER_HOUR <= 9) %>%
  group_by(ORIGIN_PT_CODE,
           DESTINATION_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))

des_weekdayPM_5_8 <- odbus %>% 
  filter(DAY_TYPE == "WEEKDAY") %>%
  filter(TIME_PER_HOUR >= 17 & TIME_PER_HOUR <= 20) %>%
  group_by(ORIGIN_PT_CODE,
           DESTINATION_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))

des_weekendAM_11_14 <- odbus %>% 
  filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
  filter(TIME_PER_HOUR >= 11 & TIME_PER_HOUR <= 14) %>%
  group_by(ORIGIN_PT_CODE,
           DESTINATION_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))

des_weekendPM_16_19 <- odbus %>% 
  filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
  filter(TIME_PER_HOUR >= 17 & TIME_PER_HOUR <= 19) %>%
  group_by(ORIGIN_PT_CODE,
           DESTINATION_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))

Joining destination table and grid

Since we have drop

Using the grid that we have created in the beginning, we will now join the data that contains the destination together with the grid.

combn_des_weekdayAM_6_9 <- left_join(des_weekdayAM_6_9,busstop_hex) %>%
  rename(ORIGIN_BS = ORIGIN_PT_CODE,
         ORIGIN_GRID = grid_id,
         DESTIN_BS = DESTINATION_PT_CODE)

combn_des_weekdayPM_5_8 <- left_join(des_weekdayPM_5_8,busstop_hex) %>%
  rename(ORIGIN_BS = ORIGIN_PT_CODE,
         ORIGIN_GRID = grid_id,
         DESTIN_BS = DESTINATION_PT_CODE)

combn_des_weekendAM_11_14 <- left_join(des_weekendAM_11_14,busstop_hex) %>%
  rename(ORIGIN_BS = ORIGIN_PT_CODE,
         ORIGIN_GRID = grid_id,
         DESTIN_BS = DESTINATION_PT_CODE)

combn_des_weekendPM_16_19 <- left_join(des_weekendPM_16_19,busstop_hex) %>%
  rename(ORIGIN_BS = ORIGIN_PT_CODE,
         ORIGIN_GRID = grid_id,
         DESTIN_BS = DESTINATION_PT_CODE)

Removing duplicates

We will then remove any duplicates using the unique() function.

uni_des_weekdayAM_6_9 <- unique(combn_des_weekdayAM_6_9)
uni_des_weekdayPM_5_8 <- unique(combn_des_weekdayPM_5_8)
uni_des_weekendAM_11_14 <- unique(combn_des_weekendAM_11_14)
uni_des_weekendPM_16_19 <- unique(combn_des_weekendPM_16_19)

Rejoin hexagonal information

We can then rejoin the hexagonal information back using left_join() function.

poly_des_weekdayAM_6_9 <- left_join(uni_des_weekdayAM_6_9 , busstop_hex,
            by = c("DESTIN_BS" = "ORIGIN_PT_CODE"))
poly_des_weekdayPM_5_8 <- left_join(uni_des_weekdayPM_5_8 , busstop_hex,
            by = c("DESTIN_BS" = "ORIGIN_PT_CODE"))
poly_des_weekendAM_11_14 <- left_join(uni_des_weekendAM_11_14 , busstop_hex,
            by = c("DESTIN_BS" = "ORIGIN_PT_CODE"))
poly_des_weekendPM_16_19 <- left_join(uni_des_weekendPM_16_19 , busstop_hex,
            by = c("DESTIN_BS" = "ORIGIN_PT_CODE"))

Recheck unique data

We will then recheck for unique data points by running unique() again.

uniP_des_weekdayAM_6_9 <- unique(poly_des_weekdayAM_6_9)
uniP_des_weekdayPM_5_8 <- unique(poly_des_weekdayPM_5_8)
uniP_des_weekendAM_11_14 <- unique(poly_des_weekendAM_11_14)
uniP_des_weekendPM_16_19 <- unique(poly_des_weekendPM_16_19)

Sum total trips

We will then sum up the total number of trips made from a bus stop of origin to the destination using group_by() for sorting. Putting multiple arguments into the function allows for the sub categorising the data. This way we can track the drop off point of the passengers.

ori_des_weekdayAM_6_9 <- uniP_des_weekdayAM_6_9 %>%
  rename(DESTIN_GRID = grid_id) %>%
  drop_na() %>%
  group_by(ORIGIN_GRID, DESTIN_GRID) %>%
  summarise(PEAK = sum(TRIPS))

ori_des_weekdayPM_5_8 <- uniP_des_weekdayPM_5_8 %>%
  rename(DESTIN_GRID = grid_id) %>%
  drop_na() %>%
  group_by(ORIGIN_GRID, DESTIN_GRID) %>%
  summarise(PEAK = sum(TRIPS))

ori_des_weekendAM_11_14 <- uniP_des_weekendAM_11_14 %>%
  rename(DESTIN_GRID = grid_id) %>%
  drop_na() %>%
  group_by(ORIGIN_GRID, DESTIN_GRID) %>%
  summarise(PEAK = sum(TRIPS))

ori_des_weekendPM_16_19 <- uniP_des_weekendPM_16_19 %>%
  rename(DESTIN_GRID = grid_id) %>%
  drop_na() %>%
  group_by(ORIGIN_GRID, DESTIN_GRID) %>%
  summarise(PEAK = sum(TRIPS))

Visualisation

We will first remove any intra-hexagonal travel.

R_weekdayAM_6_9 <- ori_des_weekdayAM_6_9[ori_des_weekdayAM_6_9$ORIGIN_GRID!=ori_des_weekdayAM_6_9$DESTIN_GRID,]
R_weekdayPM_5_8 <- ori_des_weekdayPM_5_8[ori_des_weekdayPM_5_8$ORIGIN_GRID!=ori_des_weekdayPM_5_8$DESTIN_GRID,]
R_weekendAM_11_14 <- ori_des_weekendAM_11_14[ori_des_weekendAM_11_14$ORIGIN_GRID!=ori_des_weekendAM_11_14$DESTIN_GRID,]
R_weekendPM_16_19 <- ori_des_weekendPM_16_19[ori_des_weekendPM_16_19$ORIGIN_GRID!=ori_des_weekendPM_16_19$DESTIN_GRID,]

Create desire lines

Next we will visualise all of the flow or connections between different bus stops from a subzone to another using the od2line() function from the stplanr package.

flow_weekdayAM_6_9 <- od2line(flow = R_weekdayAM_6_9, 
                    zones = honeycomb_grid_sf,
                    zone_code = "grid_id")

flow_weekdayPM_5_8 <- od2line(flow = R_weekdayPM_5_8, 
                    zones = honeycomb_grid_sf,
                    zone_code = "grid_id")

flow_weekendAM_11_14 <- od2line(flow = R_weekendAM_11_14, 
                    zones = honeycomb_grid_sf,
                    zone_code = "grid_id")

flow_weekendPM_16_19 <- od2line(flow = R_weekendPM_16_19, 
                    zones = honeycomb_grid_sf,
                    zone_code = "grid_id")

Visualise desire lines

We can then visualise this flow using the following code. Since the passenger flow is high, it is better to limit the visualisation. Here we will use a flow passenger flow from hexagons to hexagons of 1500 or more.

tmap_mode("view")
tm_shape(grid_weekdayAM) +
  tm_polygons() +
flow_weekdayAM_6_9 %>%  
  filter(PEAK >= 1500) %>%
tm_shape() +
  tm_lines(lwd = "PEAK",
           style = "quantile",
           scale = c(0.1, 1, 3, 5, 7, 10),
           n = 6,
           alpha = 0.3)
tmap_mode("view")
tm_shape(grid_weekdayPM) +
  tm_polygons() +
flow_weekdayPM_5_8 %>%  
  filter(PEAK >= 1500) %>%
tm_shape() +
  tm_lines(lwd = "PEAK",
           style = "quantile",
           scale = c(0.1, 1, 3, 5, 7, 10),
           n = 6,
           alpha = 0.3)
tmap_mode("view")
tm_shape(grid_weekendAM) +
  tm_polygons() +
flow_weekendAM_11_14 %>%  
  filter(PEAK >= 1500) %>%
tm_shape() +
  tm_lines(lwd = "PEAK",
           style = "quantile",
           scale = c(0.1, 1, 3, 5, 7, 10),
           n = 6,
           alpha = 0.3)
tmap_mode("view")
tm_shape(grid_weekendPM) +
  tm_polygons() +
flow_weekendPM_16_19 %>%  
  filter(PEAK >= 1500) %>%
tm_shape() +
  tm_lines(lwd = "PEAK",
           style = "quantile",
           scale = c(0.1, 1, 3, 5, 7, 10),
           n = 6,
           alpha = 0.3)